After running a regression model of some sort, the common way of interpreting the relationships between predictors and the outcome is by interpreting the regression coefficients. In many situations these interpretations are straightforward, however, in some settings the interpretations can be tricky, or the coefficients can simply not be interpreted.
These complications arise most commonly when a model involves an interaction or moderation. In these sort of models, interpretation of the coefficients requires special care due to the relationship between the various interacted predictors, and the set of information obtainable from the coefficients is often limited without resorting to significant algebra.
Marginal effects offer an improvement over simple coefficient interpretation. Marginal means allow you to estimate the average response under a set of conditions; e.g. the average response for each racial group, or the average response when age and weight are at certain values.
Marginal slopes estimate the slope between a predictor and the outcome when other variables are at some specific value; e.g. the average slope on age within each gender, or the average slope on age when weight is at certain values.
In either case, in addition to estimating these marginal effects, we can easily test hypotheses of equality between them via pairwise comparisons. For example, is the average response different by ethnicity, or is the slope on age different between genders.
If you are familiar with interpreting regression coefficients, and specifically interactions, you may have some idea of how to start addressing all the above examples. However, to complete answer these research questions would require more than simply a table of regression coefficients. With marginal effects, one additional set of results can address all these questions.
Finally, marginal effect estimation is a step towards creating informative visualizations of relationships such as interaction plots.
The webuse command can load the directly.
. webuse margex
(Artificial data for margins)
. summarize, sep(0)
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
y | 3,000 69.73357 21.53986 0 146.3
outcome | 3,000 .1696667 .3754023 0 1
sex | 3,000 .5006667 .5000829 0 1
group | 3,000 1.828 .7732714 1 3
age | 3,000 39.799 11.54174 20 60
distance | 3,000 58.58566 181.3987 .25 748.8927
ycn | 3,000 74.47263 22.01264 0 153.8
yc | 3,000 .2413333 .4279633 0 1
treatment | 3,000 .5006667 .5000829 0 1
agegroup | 3,000 2.263667 .8316913 1 3
arm | 3,000 1.970667 .7899457 1 3
m <- haven::read_dta("http://www.stata-press.com/data/r15/margex.dta")
m <- data.frame(m)
m$sex <- as_factor(m$sex)
m$group <- as_factor(m$group)
head(m) y outcome sex group age distance ycn yc treatment agegroup arm
1 102.6 1 female 1 55 11.75 97.3 1 1 3 1
2 77.2 0 female 1 60 30.75 78.6 0 1 3 1
3 80.5 0 female 1 27 14.25 54.5 0 1 1 1
4 82.5 1 female 1 60 29.25 98.4 1 1 3 1
5 71.6 1 female 1 52 19.00 105.3 1 1 3 1
6 83.2 1 female 1 49 2.50 89.7 0 1 3 1
. regress y i.group
Source | SS df MS Number of obs = 3,000
-------------+---------------------------------- F(2, 2997) = 15.48
Model | 14229.0349 2 7114.51743 Prob > F = 0.0000
Residual | 1377203.97 2,997 459.527518 R-squared = 0.0102
-------------+---------------------------------- Adj R-squared = 0.0096
Total | 1391433.01 2,999 463.965657 Root MSE = 21.437
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
group |
2 | .4084991 .8912269 0.46 0.647 -1.338979 2.155977
3 | 5.373138 1.027651 5.23 0.000 3.358165 7.38811
|
_cons | 68.35805 .6190791 110.42 0.000 67.14419 69.57191
------------------------------------------------------------------------------
. margins group
Adjusted predictions Number of obs = 3,000
Model VCE : OLS
Expression : Linear prediction, predict()
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
group |
1 | 68.35805 .6190791 110.42 0.000 67.14419 69.57191
2 | 68.76655 .6411134 107.26 0.000 67.50948 70.02361
3 | 73.73119 .8202484 89.89 0.000 72.12288 75.33949
------------------------------------------------------------------------------
Call:
lm(formula = y ~ group, data = m)
Residuals:
Min 1Q Median 3Q Max
-73.531 -14.758 0.101 14.536 72.569
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 68.3580 0.6191 110.419 < 2e-16 ***
group2 0.4085 0.8912 0.458 0.647
group3 5.3731 1.0277 5.229 1.83e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 21.44 on 2997 degrees of freedom
Multiple R-squared: 0.01023, Adjusted R-squared: 0.009566
F-statistic: 15.48 on 2 and 2997 DF, p-value: 2.045e-07
group emmean SE df lower.CL upper.CL
1 68.4 0.619 2997 67.1 69.6
2 68.8 0.641 2997 67.5 70.0
3 73.7 0.820 2997 72.1 75.3
Confidence level used: 0.95
Assume a linear regression set up with a single categorical variable, \(G\), with three groups. We fit
\[ E(Y|G) = \beta_0 + \beta_1g_2 + \beta_2g_3, \]
where \(g_2\) and \(g_3\) are dummy variables representing membership in groups 2 and 3 respectively (\(g_{2i} = 1\) if observation \(i\) has \(G = 2\), and equal to 0 otherwise.) Since \(g_1\) is not found in the model, it is the reference category.
Therefore, \(\beta_0\) represents the average response among the reference category \(G = 1\). \(\beta_1\) represents the difference in the average response between groups \(G = 1\) and \(G = 2\). Therefore, \(\beta_0 + \beta_1\) is the average response in group \(G = 2\). A similar argument can be made about \(\beta_2\) and group 3.
. regress y i.sex##i.group
Source | SS df MS Number of obs = 3,000
-------------+---------------------------------- F(5, 2994) = 92.91
Model | 186898.199 5 37379.6398 Prob > F = 0.0000
Residual | 1204534.81 2,994 402.316235 R-squared = 0.1343
-------------+---------------------------------- Adj R-squared = 0.1329
Total | 1391433.01 2,999 463.965657 Root MSE = 20.058
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sex |
female | 21.62507 1.509999 14.32 0.000 18.66433 24.58581
|
group |
2 | 11.37444 1.573314 7.23 0.000 8.289552 14.45932
3 | 21.6188 1.588487 13.61 0.000 18.50416 24.73344
|
sex#group |
female#2 | -4.851582 1.942744 -2.50 0.013 -8.66083 -1.042333
female#3 | -6.084875 3.004638 -2.03 0.043 -11.97624 -.1935115
|
_cons | 50.6107 1.367932 37.00 0.000 47.92852 53.29288
------------------------------------------------------------------------------
. margins sex#group
Adjusted predictions Number of obs = 3,000
Model VCE : OLS
Expression : Linear prediction, predict()
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sex#group |
male#1 | 50.6107 1.367932 37.00 0.000 47.92852 53.29288
male#2 | 61.98514 .7772248 79.75 0.000 60.46119 63.50908
male#3 | 72.2295 .8074975 89.45 0.000 70.64619 73.8128
female#1 | 72.23577 .63942 112.97 0.000 70.98203 73.48952
female#2 | 78.75863 .9434406 83.48 0.000 76.90877 80.60849
female#3 | 87.7697 2.468947 35.55 0.000 82.92869 92.6107
------------------------------------------------------------------------------
Call:
lm(formula = y ~ sex * group, data = m)
Residuals:
Min 1Q Median 3Q Max
-72.029 -13.285 0.464 13.741 67.964
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 50.611 1.368 36.998 < 2e-16 ***
sexfemale 21.625 1.510 14.321 < 2e-16 ***
group2 11.374 1.573 7.230 6.12e-13 ***
group3 21.619 1.588 13.610 < 2e-16 ***
sexfemale:group2 -4.852 1.943 -2.497 0.0126 *
sexfemale:group3 -6.085 3.005 -2.025 0.0429 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 20.06 on 2994 degrees of freedom
Multiple R-squared: 0.1343, Adjusted R-squared: 0.1329
F-statistic: 92.91 on 5 and 2994 DF, p-value: < 2.2e-16
group sex emmean SE df lower.CL upper.CL
1 male 50.6 1.368 2994 47.9 53.3
2 male 62.0 0.777 2994 60.5 63.5
3 male 72.2 0.807 2994 70.6 73.8
1 female 72.2 0.639 2994 71.0 73.5
2 female 78.8 0.943 2994 76.9 80.6
3 female 87.8 2.469 2994 82.9 92.6
Confidence level used: 0.95
Consider the simplest case, with two binary variables \(Z\) and \(K\). We fit the model
\[ E(Y|Z,K) = \beta_0 + \beta_1Z + \beta_2K + \beta_3ZK, \]
where \(ZK = 0\) only if both \(Z = 1\) and \(K = 1\).
This is functionally equivalent to defining a new variable \(L\),
\[ L = \begin{cases} 0, & K = 0 \& Z = 0 \\ 1, & K = 0 \& Z = 1 \\ 2, & K = 1 \& Z = 0 \\ 3, & K = 1 \& Z = 1, \end{cases} \]
and fitting the model
\[ E(Y|L) = \beta_0 + \beta_1l_1 + \beta_2l_2 + \beta_3l_3. \]
. regress y i.sex c.age
Source | SS df MS Number of obs = 3,000
-------------+---------------------------------- F(2, 2997) = 209.88
Model | 170938.657 2 85469.3283 Prob > F = 0.0000
Residual | 1220494.35 2,997 407.238688 R-squared = 0.1229
-------------+---------------------------------- Adj R-squared = 0.1223
Total | 1391433.01 2,999 463.965657 Root MSE = 20.18
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sex |
female | 14.03271 .7777377 18.04 0.000 12.50775 15.55766
age | -.5043666 .033698 -14.97 0.000 -.5704402 -.4382931
_cons | 82.78115 1.323613 62.54 0.000 80.18586 85.37643
------------------------------------------------------------------------------
. margins, at(age = (30 40))
Predictive margins Number of obs = 3,000
Model VCE : OLS
Expression : Linear prediction, predict()
1._at : age = 30
2._at : age = 40
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_at |
1 | 74.67586 .4947551 150.93 0.000 73.70576 75.64595
2 | 69.63219 .3684998 188.96 0.000 68.90965 70.35473
------------------------------------------------------------------------------
. margins sex, at(age = (30 40))
Adjusted predictions Number of obs = 3,000
Model VCE : OLS
Expression : Linear prediction, predict()
1._at : age = 30
2._at : age = 40
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_at#sex |
1#male | 67.65015 .5604889 120.70 0.000 66.55116 68.74913
1#female | 81.68285 .6911128 118.19 0.000 80.32775 83.03796
2#male | 62.60648 .537682 116.44 0.000 61.55222 63.66074
2#female | 76.63919 .533784 143.58 0.000 75.59257 77.68581
------------------------------------------------------------------------------
For each unique level of the continuous variable, a seperate emmeans call is needed apparently. If anyone knows a way around this, please let me know.
Call:
lm(formula = y ~ sex + age, data = m)
Residuals:
Min 1Q Median 3Q Max
-71.99 -13.88 -0.15 13.76 75.28
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 82.7811 1.3236 62.54 <2e-16 ***
sexfemale 14.0327 0.7777 18.04 <2e-16 ***
age -0.5044 0.0337 -14.97 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 20.18 on 2997 degrees of freedom
Multiple R-squared: 0.1229, Adjusted R-squared: 0.1223
F-statistic: 209.9 on 2 and 2997 DF, p-value: < 2.2e-16
1 emmean SE df lower.CL upper.CL
overall 74.7 0.495 2997 73.7 75.6
Results are averaged over the levels of: sex
Confidence level used: 0.95
1 emmean SE df lower.CL upper.CL
overall 69.6 0.369 2997 68.9 70.3
Results are averaged over the levels of: sex
Confidence level used: 0.95
Note the use of ~ 1. This argument (spec) cannot be excluded, but using ~ 1 basically ignores it and only focuses on the at.
sex emmean SE df lower.CL upper.CL
male 67.7 0.560 2997 66.6 68.7
female 81.7 0.691 2997 80.3 83.0
Confidence level used: 0.95
sex emmean SE df lower.CL upper.CL
male 62.6 0.538 2997 61.6 63.7
female 76.6 0.534 2997 75.6 77.7
Confidence level used: 0.95
. regress y i.sex##c.age
Source | SS df MS Number of obs = 3,000
-------------+---------------------------------- F(3, 2996) = 139.91
Model | 170983.675 3 56994.5583 Prob > F = 0.0000
Residual | 1220449.33 2,996 407.35959 R-squared = 0.1229
-------------+---------------------------------- Adj R-squared = 0.1220
Total | 1391433.01 2,999 463.965657 Root MSE = 20.183
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sex |
female | 14.92308 2.789012 5.35 0.000 9.454508 20.39165
age | -.4929608 .0480944 -10.25 0.000 -.5872622 -.3986595
|
sex#c.age |
female | -.0224116 .0674167 -0.33 0.740 -.1545994 .1097762
|
_cons | 82.36936 1.812958 45.43 0.000 78.8146 85.92413
------------------------------------------------------------------------------
. margins, at(age = (30 40))
Predictive margins Number of obs = 3,000
Model VCE : OLS
Expression : Linear prediction, predict()
1._at : age = 30
2._at : age = 40
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_at |
1 | 74.71541 .5089298 146.81 0.000 73.71752 75.71329
2 | 69.67359 .3890273 179.10 0.000 68.9108 70.43638
------------------------------------------------------------------------------
. margins sex, at(age = (30 40))
Adjusted predictions Number of obs = 3,000
Model VCE : OLS
Expression : Linear prediction, predict()
1._at : age = 30
2._at : age = 40
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_at#sex |
1#male | 67.58054 .5984013 112.94 0.000 66.40722 68.75386
1#female | 81.83127 .8228617 99.45 0.000 80.21784 83.4447
2#male | 62.65093 .5541361 113.06 0.000 61.56441 63.73746
2#female | 76.67755 .5461908 140.39 0.000 75.6066 77.74849
------------------------------------------------------------------------------
Call:
lm(formula = y ~ sex * age, data = m)
Residuals:
Min 1Q Median 3Q Max
-71.817 -13.848 -0.116 13.758 75.263
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 82.36936 1.81296 45.434 < 2e-16 ***
sexfemale 14.92308 2.78901 5.351 9.42e-08 ***
age -0.49296 0.04809 -10.250 < 2e-16 ***
sexfemale:age -0.02241 0.06742 -0.332 0.74
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 20.18 on 2996 degrees of freedom
Multiple R-squared: 0.1229, Adjusted R-squared: 0.122
F-statistic: 139.9 on 3 and 2996 DF, p-value: < 2.2e-16
1 emmean SE df lower.CL upper.CL
overall 74.7 0.509 2996 73.7 75.7
Results are averaged over the levels of: sex
Confidence level used: 0.95
1 emmean SE df lower.CL upper.CL
overall 69.7 0.389 2996 68.9 70.4
Results are averaged over the levels of: sex
Confidence level used: 0.95
sex emmean SE df lower.CL upper.CL
male 67.6 0.598 2996 66.4 68.8
female 81.8 0.823 2996 80.2 83.4
Confidence level used: 0.95
sex emmean SE df lower.CL upper.CL
male 62.7 0.554 2996 61.6 63.7
female 76.7 0.546 2996 75.6 77.7
Confidence level used: 0.95
. regress y i.sex c.age
Source | SS df MS Number of obs = 3,000
-------------+---------------------------------- F(2, 2997) = 209.88
Model | 170938.657 2 85469.3283 Prob > F = 0.0000
Residual | 1220494.35 2,997 407.238688 R-squared = 0.1229
-------------+---------------------------------- Adj R-squared = 0.1223
Total | 1391433.01 2,999 463.965657 Root MSE = 20.18
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sex |
female | 14.03271 .7777377 18.04 0.000 12.50775 15.55766
age | -.5043666 .033698 -14.97 0.000 -.5704402 -.4382931
_cons | 82.78115 1.323613 62.54 0.000 80.18586 85.37643
------------------------------------------------------------------------------
. margins, dydx(age)
Average marginal effects Number of obs = 3,000
Model VCE : OLS
Expression : Linear prediction, predict()
dy/dx w.r.t. : age
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.5043666 .033698 -14.97 0.000 -.5704402 -.4382931
------------------------------------------------------------------------------
Call:
lm(formula = y ~ sex + age, data = m)
Residuals:
Min 1Q Median 3Q Max
-71.99 -13.88 -0.15 13.76 75.28
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 82.7811 1.3236 62.54 <2e-16 ***
sexfemale 14.0327 0.7777 18.04 <2e-16 ***
age -0.5044 0.0337 -14.97 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 20.18 on 2997 degrees of freedom
Multiple R-squared: 0.1229, Adjusted R-squared: 0.1223
F-statistic: 209.9 on 2 and 2997 DF, p-value: < 2.2e-16
1 age.trend SE df lower.CL upper.CL
overall -0.504 0.0337 2997 -0.57 -0.438
Results are averaged over the levels of: sex
Confidence level used: 0.95
The Stata margins command to obtain marginal means includes the “dydx” option, which points to derivative - and indeed, this is exactly what is computer. If we have a basic regression model,
\[ E(Y|X) = \beta_0 + \beta_1X \]
taking the derivative with respect to \(X\) will obtain \(\beta_1\), which is the estimated coefficient.
If \(X\) enters the model in a more complex way, say a polynomial term:
\[ E(Y|X) = \beta_0 + \beta_1X + \beta_2X^2 \]
now the derivative is \(\beta_1 + 2\beta_2X\). Similar to marginal means, where the predicted outcome was estimated for each individual then those outcomes averaged, here the derivative is estimated plugging in each observation’s value of \(X\), then averaged.
. regress y i.sex##c.age
Source | SS df MS Number of obs = 3,000
-------------+---------------------------------- F(3, 2996) = 139.91
Model | 170983.675 3 56994.5583 Prob > F = 0.0000
Residual | 1220449.33 2,996 407.35959 R-squared = 0.1229
-------------+---------------------------------- Adj R-squared = 0.1220
Total | 1391433.01 2,999 463.965657 Root MSE = 20.183
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sex |
female | 14.92308 2.789012 5.35 0.000 9.454508 20.39165
age | -.4929608 .0480944 -10.25 0.000 -.5872622 -.3986595
|
sex#c.age |
female | -.0224116 .0674167 -0.33 0.740 -.1545994 .1097762
|
_cons | 82.36936 1.812958 45.43 0.000 78.8146 85.92413
------------------------------------------------------------------------------
. margins sex, dydx(age)
Average marginal effects Number of obs = 3,000
Model VCE : OLS
Expression : Linear prediction, predict()
dy/dx w.r.t. : age
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age |
sex |
male | -.4929608 .0480944 -10.25 0.000 -.5872622 -.3986595
female | -.5153724 .0472435 -10.91 0.000 -.6080054 -.4227395
------------------------------------------------------------------------------
Call:
lm(formula = y ~ sex * age, data = m)
Residuals:
Min 1Q Median 3Q Max
-71.817 -13.848 -0.116 13.758 75.263
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 82.36936 1.81296 45.434 < 2e-16 ***
sexfemale 14.92308 2.78901 5.351 9.42e-08 ***
age -0.49296 0.04809 -10.250 < 2e-16 ***
sexfemale:age -0.02241 0.06742 -0.332 0.74
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 20.18 on 2996 degrees of freedom
Multiple R-squared: 0.1229, Adjusted R-squared: 0.122
F-statistic: 139.9 on 3 and 2996 DF, p-value: < 2.2e-16
sex age.trend SE df lower.CL upper.CL
male -0.493 0.0481 2996 -0.587 -0.399
female -0.515 0.0472 2996 -0.608 -0.423
Confidence level used: 0.95
Let’s assume we have a binary variable, \(Z\), interacted with a continuous variable, \(X\).
\[ E(Y|X,Z) = \beta_0 + \beta_1X + \beta_2Z + \beta_3XZ \]
Here we’re combining the math from marginal effects with marginal slopes. First, we generate two equations, one for \(Z = 0\) and one for \(Z = 1\):
\[\begin{align*} E(Y|X, Z = 0) & = \beta_0 + \beta_1X \\ E(Y|X, Z = 1) & = \beta_0 + \beta_1X + \beta_2 + \beta_3X = (\beta_0 + \beta_2) + (\beta_1 + \beta_3)X. \end{align*}\]
Then when we differentiate each with respect to \(X\), we obtain \(\beta_1\) and \((\beta_1 + \beta_3)\) respectively.
. regress y i.group
Source | SS df MS Number of obs = 3,000
-------------+---------------------------------- F(2, 2997) = 15.48
Model | 14229.0349 2 7114.51743 Prob > F = 0.0000
Residual | 1377203.97 2,997 459.527518 R-squared = 0.0102
-------------+---------------------------------- Adj R-squared = 0.0096
Total | 1391433.01 2,999 463.965657 Root MSE = 21.437
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
group |
2 | .4084991 .8912269 0.46 0.647 -1.338979 2.155977
3 | 5.373138 1.027651 5.23 0.000 3.358165 7.38811
|
_cons | 68.35805 .6190791 110.42 0.000 67.14419 69.57191
------------------------------------------------------------------------------
. margins group, pwcompare(pv)
Pairwise comparisons of adjusted predictions Number of obs = 3,000
Model VCE : OLS
Expression : Linear prediction, predict()
-----------------------------------------------------
| Delta-method Unadjusted
| Contrast Std. Err. t P>|t|
-------------+---------------------------------------
group |
2 vs 1 | .4084991 .8912269 0.46 0.647
3 vs 1 | 5.373138 1.027651 5.23 0.000
3 vs 2 | 4.964638 1.041073 4.77 0.000
-----------------------------------------------------
Note that pairs is a generic, meaning that emmeans::pairs is an invalid call. Despite this, the “emmeans” package is required to call the below.
Call:
lm(formula = y ~ group, data = m)
Residuals:
Min 1Q Median 3Q Max
-73.531 -14.758 0.101 14.536 72.569
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 68.3580 0.6191 110.419 < 2e-16 ***
group2 0.4085 0.8912 0.458 0.647
group3 5.3731 1.0277 5.229 1.83e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 21.44 on 2997 degrees of freedom
Multiple R-squared: 0.01023, Adjusted R-squared: 0.009566
F-statistic: 15.48 on 2 and 2997 DF, p-value: 2.045e-07
contrast estimate SE df t.ratio p.value
1 - 2 -0.408 0.891 2997 -0.458 0.6467
1 - 3 -5.373 1.028 2997 -5.229 <.0001
2 - 3 -4.965 1.041 2997 -4.769 <.0001
The adjust = "none" argument skips any multiple comparisons correction. This is done to obtain the same results as the regression coefficients. If you’d prefer a more ANOVA post-hoc approach, there are other options for adjust, see ?emmeans::contrast for details.
. regress y i.group##i.sex
Source | SS df MS Number of obs = 3,000
-------------+---------------------------------- F(5, 2994) = 92.91
Model | 186898.199 5 37379.6398 Prob > F = 0.0000
Residual | 1204534.81 2,994 402.316235 R-squared = 0.1343
-------------+---------------------------------- Adj R-squared = 0.1329
Total | 1391433.01 2,999 463.965657 Root MSE = 20.058
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
group |
2 | 11.37444 1.573314 7.23 0.000 8.289552 14.45932
3 | 21.6188 1.588487 13.61 0.000 18.50416 24.73344
|
sex |
female | 21.62507 1.509999 14.32 0.000 18.66433 24.58581
|
group#sex |
2#female | -4.851582 1.942744 -2.50 0.013 -8.66083 -1.042333
3#female | -6.084875 3.004638 -2.03 0.043 -11.97624 -.1935115
|
_cons | 50.6107 1.367932 37.00 0.000 47.92852 53.29288
------------------------------------------------------------------------------
. margins group#sex, pwcompare(pv)
Pairwise comparisons of adjusted predictions Number of obs = 3,000
Model VCE : OLS
Expression : Linear prediction, predict()
------------------------------------------------------------------
| Delta-method Unadjusted
| Contrast Std. Err. t P>|t|
--------------------------+---------------------------------------
group#sex |
(1#female) vs (1#male) | 21.62507 1.509999 14.32 0.000
(2#male) vs (1#male) | 11.37444 1.573314 7.23 0.000
(2#female) vs (1#male) | 28.14793 1.661722 16.94 0.000
(3#male) vs (1#male) | 21.6188 1.588487 13.61 0.000
(3#female) vs (1#male) | 37.159 2.822577 13.16 0.000
(2#male) vs (1#female) | -10.25064 1.006447 -10.18 0.000
(2#female) vs (1#female) | 6.522856 1.13971 5.72 0.000
(3#male) vs (1#female) | -.0062748 1.030005 -0.01 0.995
(3#female) vs (1#female) | 15.53392 2.550404 6.09 0.000
(2#female) vs (2#male) | 16.77349 1.222358 13.72 0.000
(3#male) vs (2#male) | 10.24436 1.120772 9.14 0.000
(3#female) vs (2#male) | 25.78456 2.588393 9.96 0.000
(3#male) vs (2#female) | -6.529131 1.241826 -5.26 0.000
(3#female) vs (2#female) | 9.011069 2.643063 3.41 0.001
(3#female) vs (3#male) | 15.5402 2.597644 5.98 0.000
------------------------------------------------------------------
. margins sex@group, contrast(pv nowald)
Contrasts of adjusted predictions Number of obs = 3,000
Model VCE : OLS
Expression : Linear prediction, predict()
------------------------------------------------------------
| Delta-method
| Contrast Std. Err. t P>|t|
--------------------+---------------------------------------
sex@group |
(female vs base) 1 | 21.62507 1.509999 14.32 0.000
(female vs base) 2 | 16.77349 1.222358 13.72 0.000
(female vs base) 3 | 15.5402 2.597644 5.98 0.000
------------------------------------------------------------
To do: All pairwise comparisons
Call:
lm(formula = y ~ group * sex, data = m)
Residuals:
Min 1Q Median 3Q Max
-72.029 -13.285 0.464 13.741 67.964
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 50.611 1.368 36.998 < 2e-16 ***
group2 11.374 1.573 7.230 6.12e-13 ***
group3 21.619 1.588 13.610 < 2e-16 ***
sexfemale 21.625 1.510 14.321 < 2e-16 ***
group2:sexfemale -4.852 1.943 -2.497 0.0126 *
group3:sexfemale -6.085 3.005 -2.025 0.0429 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 20.06 on 2994 degrees of freedom
Multiple R-squared: 0.1343, Adjusted R-squared: 0.1329
F-statistic: 92.91 on 5 and 2994 DF, p-value: < 2.2e-16
group = 1:
contrast estimate SE df t.ratio p.value
male - female -21.6 1.51 2994 -14.321 <.0001
group = 2:
contrast estimate SE df t.ratio p.value
male - female -16.8 1.22 2994 -13.722 <.0001
group = 3:
contrast estimate SE df t.ratio p.value
male - female -15.5 2.60 2994 -5.982 <.0001
. regress y i.sex##c.age
Source | SS df MS Number of obs = 3,000
-------------+---------------------------------- F(3, 2996) = 139.91
Model | 170983.675 3 56994.5583 Prob > F = 0.0000
Residual | 1220449.33 2,996 407.35959 R-squared = 0.1229
-------------+---------------------------------- Adj R-squared = 0.1220
Total | 1391433.01 2,999 463.965657 Root MSE = 20.183
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sex |
female | 14.92308 2.789012 5.35 0.000 9.454508 20.39165
age | -.4929608 .0480944 -10.25 0.000 -.5872622 -.3986595
|
sex#c.age |
female | -.0224116 .0674167 -0.33 0.740 -.1545994 .1097762
|
_cons | 82.36936 1.812958 45.43 0.000 78.8146 85.92413
------------------------------------------------------------------------------
. margins sex, at(age = (20(10)60))
Adjusted predictions Number of obs = 3,000
Model VCE : OLS
Expression : Linear prediction, predict()
1._at : age = 20
2._at : age = 30
3._at : age = 40
4._at : age = 50
5._at : age = 60
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_at#sex |
1#male | 72.51015 .9336568 77.66 0.000 70.67947 74.34082
1#female | 86.985 1.22567 70.97 0.000 84.58175 89.38824
2#male | 67.58054 .5984013 112.94 0.000 66.40722 68.75386
2#female | 81.83127 .8228617 99.45 0.000 80.21784 83.4447
3#male | 62.65093 .5541361 113.06 0.000 61.56441 63.73746
3#female | 76.67755 .5461908 140.39 0.000 75.6066 77.74849
4#male | 57.72132 .8477402 68.09 0.000 56.05911 59.38353
4#female | 71.52382 .604927 118.24 0.000 70.33771 72.70994
5#male | 52.79171 1.262091 41.83 0.000 50.31706 55.26637
5#female | 66.3701 .9380502 70.75 0.000 64.53081 68.20939
------------------------------------------------------------------------------
. marginsplot, recastci(rarea)
Variables that uniquely identify margins: age sex
Call:
lm(formula = y ~ sex * age, data = m)
Residuals:
Min 1Q Median 3Q Max
-71.817 -13.848 -0.116 13.758 75.263
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 82.36936 1.81296 45.434 < 2e-16 ***
sexfemale 14.92308 2.78901 5.351 9.42e-08 ***
age -0.49296 0.04809 -10.250 < 2e-16 ***
sexfemale:age -0.02241 0.06742 -0.332 0.74
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 20.18 on 2996 degrees of freedom
Multiple R-squared: 0.1229, Adjusted R-squared: 0.122
F-statistic: 139.9 on 3 and 2996 DF, p-value: < 2.2e-16